Problem fix [v2] #13

sumwailiu · 2025-02-15T03:43:14Z

There are some mistakes in the first version of the problem fix (#12), although I indeed got better final results.

Something to be corrected:

The origin computation of reward_loss is indeed correct and it should not be modified.

Main changes:

[Fix 1] The computation of returns in main.py is modified to be the same as the
one at line 187 in dreamer.py. of Dreamer (tensorflow2 implementation).
[Fix 2] Although both the default planning horizon of the origin paper and dreamer-pytorch are 15, this parameter is actually equal to 15-1=14 in dreamer-pytoch. This minor problem is now fixed.

The factors contributing to the issue #4 :

@letusfly85 and @coderlemon17 found that their training results are not as good as the testing ones of the origin paper (around 700 at 1M steps). This is reasonable because noise is added at action during training while the noise is removed when testing. Here are the training and testing results of the origin dreamer-pytoch (i.e., the version before applying my fixes) in walker-run env, where epoch 1000 stands for 1M steps.
Planning horizon is another factor. Here is the result of the dreamer-pytorch + Fix 2, where the testing return is around 700 at epoch 1000 (1M steps).